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The development of mobile apps with augmented reality (AR) would 
enhance the capability in visualizing the scene or environment. Any apps 
supported by computer aided design versions with 3D models makes the 
design more realistic, such as in the form of websites or mobile apps. 
However, the current features for online platforms for shopping are quite 
limited and lack 3D visualization features. This paper presents the 
development of a mobile application, pro-visualizer app called PRO-VAS, 
that utilizes AR for scanning and visualizing the environment. PRO-VAS 
acts as a product visualizer that applies visual simultaneous localization and 
mapping (VSLAM) for localization of the product in AR based systems. 
The main components of PRO-VAS are ARCore from Google for interactive 
purposes, and the depth mapping from red green blue depth (RGB-D) phone 
camera with point plane generator and markerless tracking method. The last 
component of the app is the set of objects from the unity store, which can be 
chosen in PRO-VAS for the scanned scene area. The app was tested in 
various environments involving different objects and has shown competitive 
results. In the future, more features and products can be added to the apps. 
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1. INTRODUCTION 


Mobile devices are widely used by most people in the world. Developing an augmented reality (AR) 
enabled system on mobile devices can make people learn more interactively instead of using camera filters. 
Act of realism is one of the main factors for the system ideation and approaches. Interactions between the 
real world and the virtual world also show how virtual objects can do the same thing as the real objects do 
which brings the execution of AR [1]. The broad range of technology can bring humans with machine easily. 
The world right now is connected without restrictions through the online platform. Everything is on the 
fingertips and it makes the user have full control on what they want and desire for. An application that really 
helps users on their daily life is what they need right now, such as for dermatological diagnosis [2] that assist 
users to identify main skin diseases using three main languages and child tracking observation and location 


tracking system [3]. 
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Our paper proposed an app that can visualize objects and provides mapping of the scene. 
The interaction that happens between the real time and virtual reality is based on the pose estimation, which is 
based on points, lines, distance and lines formation that estimate the geographical data of the environment [4]. 
Visual simultaneous localization and mapping (VSLAM) is usually used in the robotics field in order to 
simultaneously localize and create maps for the robot to move around the environment [5], [6]. Depth 
cameras can produce red green blue depth (RGB-D) mapping widely even if there is a phone that have a 
LiDAR sensor like iPhone 12. Commonly, the simultaneous localization and mapping (SLAM) features that 
are applied in applications have limitation in reading the visual images and points of the environment, 
especially to in mapping the environment data [6], [7]. Meanwhile, most applications that use markerless 
tracking do not meet the realistic view as it should be. 

This work aims to produce a mobile app that can scan and visualise the environment in real time 
thus helping the user to choose the right furniture or products for designing their interior house. 
The contribution of this paper includes: 1) the use of VSLAM for localization of the object; and 2) the 
proposed components for the apps namely, AR, VSLAM, point plane, RGB-D mapping and markerless 
tracking method. These components create distinct characteristics compared to other existing similar mobile 
apps. The remainder of this paper is organised as: section 2 describes the past studies and presents the similar 
mobile apps in the market. Section 3 presents the components of the pro-visualizer app (PRO-VAS) while 
section 4 discusses the development. The results and findings of the study are presented in section 5 and 
finally, section 6 concludes the paper. 


2. SIMILAR MOBILE APPS AND LIMITATIONS 
2.1. IKEA Place 

IKEA practices the concept of minimalism which can be applied to most homes in the world. 
As time goes by, IKEA has collected all the data from the company into a database and made it available to 
be used in mobile applications with augmented reality [8] in IKEA catalog and IKEA store. IKEA wanted to 
ensure that the customers experience new ways of buying products with the aid of augmented reality [9]. 
The idea of IKEA Place app has profoundly to be the most creative way to solve the practical way where 
customers need to view and survey the product first before visiting the physical shop. With a list of 2000 
objects and accessories, it wasproven that this application could help the customers to choose perfect 
products for their houses [10], [11]. 


2.2. Houzz 

Houzz is another interior design mobile application that uses AR, with idea trends and the price for 
every single product displayed on the application as shown in Figure 1(a). Through commenting features, the 
user engagement will be increasing, and it can be considered as a reference point for ensuring that users make 
the right choice [11]. This new feature can detect the floor orientation and users have the capability to 
estimate the amount of tile needed to put inside their house. The developers also expand the functionality of 
the app whereby users can decorate their walls with the vertical plane detection feature [12]. 


2.3. Intiaro 

Intiaro is a platform where users can buy and sell their products in the mobile application as shown in 
Figure 1(b), which makes the functionality more interesting. A group of developers in Intiaro can do 3D model 
design for users throughout the world including business organizations or individuals. The functionality 
supports entrepreneurs in furniture and interior design industries [13]. The 3D digitization and AR mobile 
application makes all furnitures in the application unscalable and makes the application realistic in which the 
measurements haveto be exactly the same with the real products [14] with a high-level visualization. 


(a) (b) 


Figure 1. Interface of (a) Houzz and (b) Intiaro apps 
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3. THE PROPOSED COMPONENTS OF PRO-VAS APP 

In this study, AR really takes a big part to make a great visualization. Thankfully, lots of companies 
consider this AR technology in their mobile applications to help users and customers make buying decisions. 
For the development of the app, the main components are identified, where are the AR, VSLAM, point plane, 
RGB-D mapping and markerless tracking method. Other important context is that the processes involved in 
the app should be helping the users to access the functions effectively [2], [3] for user satisfaction [15]. 


3.1. AR 

One of the methods on applying AR is using ARCore to track position of mobile devices as it moves 
and builds the knowledge of the surrounding environment. It also can detect the point whenever the mobile 
devices move. The contour detection with the wall paint is developed by using OpenCV, a library that comes 
along with Python programming [16]. Among important classes in AR Systems are task focus, nature of 
augmentation, and OP-a-S for interaction modelling [1]. The nature of augmentation can be divided into two 
parts which are execution and evaluation. For the execution part, the user can perform a lot of quality tasks, 
while the evaluation part is based on the user perception in which much realistic information will be provided 
to the user [17]. 


3.2. VSLAM 

The most important component in this project is the VSLAM. VSLAM is used to identify visual 
images, angles and points on the surface. VSLAM is an algorithm that is usually used on robots that can 
navigate throughout the environment with the help of the vision sensor [18]. The VSLAM allows the robot to 
navigate and clean the larger spaces in satisfying straight lines. The challenge in the VSLAM algorithm is on 
the dynamic illustration of the multi segmentation target on the scene captured by the camera sensors [19]. 
The VSLAM algorithm consists of several main components, which are feature extraction, feature matching, 
pose estimation, pose optimization and map updating, as shown in Figure 2. Feature extraction functionality 
extracts every single image that is captured as the camera sensors move and change its position. Feature 
matching functionality creates a map based on the extraction made from the RGB-D camera sensors. Pose 
estimation estimates the camera rotation based on position and angle. Pose optimization minimizes the error 
made when estimating the coordination of the camera pose. Map updating updates the map created based on 
every single camera orientation [19]. SLAM is a method to map the environment and represent it in a 
collection of points. In order to take advantage of the last success of graph-based approaches on SLAM, the 
framework is constructed based on the advanced feature-based SLAM system, orbital (ORB) SLAM. This 
allows us to take advantage of the sparsity of the outline and at the same time joining more semantically the 
important geometric primitives such as planes within the outline [7], [20], [21]. 


pitini Feature Feature 
ave ; : 
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Pose Pose Estimation 


Optimization 


Map Updating 


Figure 2. VSLAM algorithm main components, from input image to map updating 


3.3. RGB-D mapping 

RGB-D mapping is required to provide a 3D dense mapping for the system in indoor environment. 
This detection and mapping for objects in the indoor environments could help many events such as detecting 
the earthquakes and catastrophic events. A typical setup comprises a set of RGB-D cameras positioned 
equally tall, encompassing a volume in which people/objects in the scene will be localized and followed. 
The calibration of the camera utilizes the RGB camera of each sensor alongside the common framework 
design. A huge lattice is set on the floor in a unique position by all cameras and pictures are procured and 
calibrated utilizing ordinary grid-based calibration. Depth camera will collect the point cloud density and turn 
into a 3D modelling data to configure the points around the environment. The sparse features are then moved 
to random sample consensus (RANSAC) to estimate the geometrical transformation in the computer view 
and perception. Then it will build a point cloud map with the global optimization based on the geometrical 
point estimation. The map is then used by the mobile devices to generate plane [22]. 
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3.4. Markerless tracking method 

Markerless tracking method does a big part in this study. The best thing about markerless tracking 
method is its functionality to detect the visual images or global positioning system (GPS) and sense them just 
by angles and points. Previous equipment arrangements in augmented reality constrained the user either to 
wear big computing gadgets or to be connected with them by means of adaptable cable. To overcome these 
problems, a markerless tracking system was deployed with a combination of light weight mobile setup [23]. 
The workflow of the markerless tracking method starts with video capturing, point tracking, planar object 
detection and rendering. Currently, it is hard to find the best system that can replicate markerless tracking 
methods applied to robotics. Nevertheless, the usage of markerless tracking methods can be implemented in 
many systems such as system for multimedia-based application [2]. 


4. PRO-VAS DEVELOPMENT METHOD 
4.1. VSLAM application in AR 

AR core session was set to activate the back facing camera with AR controller and ar camera 
configuration. The configuration means that the back-facing camera of the device will activate this two 
functionalities after users have permitted the camera access for the app. The development of PRO-VAS 
involved VSLAM application together with RGB-D mapping and point plane SLAM [19], [24]. The SLAM 
feature will produce mapping for the environment. Users needs to trace the whole environment and put an 
object in the space. Figure 3 shows the RGB-D mapping flowchart for the app. For this study, the RGB-D 
camera used was pixel 3XL that supports depth cameras, and also it can detect line, point and plane. If the 
camera is able to detect any line, point or plane then it will proceed to pose graph optimization for the 
detected surfaces. If no line was detected, then the RGB-D camera will continue to scan the environment. 
After pose graph optimization, it continues the mapping process. The mapping process will build a map in 
which the app can identify suitable locations to generate the 3D object. After that, if the camera is oriented in 
another direction then the RGB-D camera will start over the process, with similar description in [25]. If the 
camera is not oriented to another direction then the process will end. Further explanation can be found in the 
next section 4.2. 


Line Right 


Detection orientation 
tind Point Yes 
Detection 


Pose graph Mapping 
optimization 


Figure 3. PRO-VAS flowchart 
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Meanwhile Figure 4 shows three main important screens of PRO-VAS, since the PRO-VAS is highly 
depending on the RGB-D camera, so it will notify the user if the device is supported or not, in Figure 4(a), then 
if yes, the second screen will be appeared, in Figure 4(b) and guide the user to scene the area for the suitable 
location of the 3D objects in Figure 4(c). Finally the third screen appears to confirm the product being 
choosen. Without the VSLAM application, the object could be following the camera movement and do not 
stay at one place. Meanwhile, Table 1 shows the comparison of components in PRO-VAS to other discussed 
apps in the previous section. 


Table 1. Comparison of PRO-VAS with other apps 
Mobile app AR Plane detection _ RGB-D mapping/VSLAM 
PRO-VAS V Markerless V 


Ikea Place y No information 
Houzz y Markerless 
Intiaro N Markerless 
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(a) (b) (c) 


Figure 4. PRO-VAS main screens with (a) popup message when the phone supports RGB-D mapping, 
(b) hand animation to guide the used use camera, and (c) screen in choosing the object 


4.2. Process design 

The interaction of the app is started by enabling the system access to the phone camera in choosing 
object, as shown in Figure 5(a). Firstly, users need to scan the environment to detect flat surfaces. Next, 
a collection of 3D objects that can be chosen will appear on the screen of the phone. As the app scans the 
environment, users can get a full control of that object by moving the object and rotating it, so the app 
enables users to manipulate the scene. 


PROCESS 


(a) (b) (c) (d) 


Figure 5. PRO-VAS (a) process flow, (b) the detection of the object location with vertical plane, 
(c) horizontal plane, and (d) the preview of RGB-D 


Controller component scripts have depth menu, instant placement menu, instant placement prefab 
and first-person camera. Depth menu consists of selection whether the user wants to enable the depth 
application programming integration (API) and depth map. The depth API inside this application gives a full 
optimization on the AR users. Instant placement menu script activates the object prefab once the user 
generates the object into the real world. First person camera is the camera chosen which is the back facing 
camera to give the view of the scene to generate 3D objects. Plane discovery guide gives the guide to users 
on discovering the environment to detect planes. Based on Figure 4, there will be hand animation to guide 
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users on searching for surfaces to place objects. The prefab used for the plane is detected plane visualizer. 
Prefab will be generated once the camera detects plane surfaces in the environment. New planes will be 
instantiated every time the camera finds new surfaces which are either horizontally or vertically. Figure 5(b) 
and Figure 5(c) show the horizontal plane and the vertical plane on the environment. Camera detected the 
planes as the user hover the phone over the environment surfaces. On Figure 5(d) shows the preview RGB-D 
for the scene through PRO-VAS on the phone. 


4.3. Product collection 

PRO-VAS provides an option of products to be chosen by the user that can be placed in the scene. 
The use of the RGB-D camera makes the product/object appear as 3D objects and able to perform 
environment tracing. The 3D objects were downloaded from the unity asset store from the category of 
furniture, as shown in previous Figure 4(c). All customized fifteen objects were packaged together with 
models, material, texture and prefabs and assembled as a complete set of 3D object. The size of the object 
was designed basically the same with the exact size of furniture. 


5. RESULTS AND DISCUSSION 
5.1. Object on horizontol plane and vertical plane 

Figure 6(a) shows a 3D object which is a chair that is placed on a carpet. The object has some 
shadow as it has some reflection of the sunlight. The replication of sunlight was created based on the amount 
of light received by the camera and applied on the 3D object. Based on Figure 6(b) and Figure 6(c), the 3D 
objects which are frames appeared on the vertical planes on the wall. The frame on Figure 6(b) is the 180° at 
the y-axis while the frame in Figure 6(c) is at 180° at x-axis. 3D objects are applied once the camera detects 
the vertical surfaces. Figure 6(d) shows that this camera angle is near the comer of the frame. There is no 
space between the frame and the wall, which means that the frame is precisely attached and reacts to the 
environment. 


(a) (b) (c) (d) 


Figure 6. 3D objects appearance in pro-vas app in different views (a) 3d object on horizontal plane, 
(b) 3d object 180° at y-axis, (c) 3d object 180° at x-axis, and (d) 3d object closer look 


5.2. Object localization 

This section provides result of the augmented appearance to show the localization of an object when 
the camera is oriented to another direction. The outcomes of object localization for table is shown in Figure 7(a) 
and for television is shown in Figure 7(b). The objects were tested on different locations from different angles. 
The purpose of this testing was to determine whether the object would remain at the same place if the camera 
was oriented to another direction. The localization and mapping succeed if the objects remain at the same 
place when they are localized at a certain place, and they still can build a map even when the map is oriented 
to another direction. The location of the testing was done in a kitchen, a small size room, a master room, and 
a living room with 5 different customized objects which are televisions, an armchair, a floor light and a table. 
Objects in the master room show the best texture and shaders because the room has a good lighting whereas 
objects in the kitchen get less shader and texture due to not receiving enough light. The objects can also be 
put at different places in the room rather than on the floor. 
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5.3. Market survey 

We have done a market survey in order to get the feedback from the users on PRO-VAS. There are 
40 respondents participating in the survey, with the majority giving positive feedback. Sixty percent of 
respondents strongly agree that they are interested in AR features, and 57 percent agree that the objects have 
been visualized clearly in PRO-VAS. Majority of the respondents think the app is helpful or very helpful in 
determining suitable furniture. Strong majority or 60 percent of them intend to install the mobile application 
and the majority wants to obtain a free app with the current features. 40 percent of respondents are willing to 
spend a range of RM5 to RM10 with more objects and buying features. Surprisingly, 8 percent are willing to 
pay more with more objects and ads free apps. PRO-VAS is rated as a good or excellent app by majority. 
This app can save users’ time with the measurements for the product provided and visualize in the selected 
space, so the users can determine whether the product is fit or not. 


Km 


TA SPN = i s~ P . 
Kitchen Small size room Master room Living room Kitchen Small size room Master room Living room 


(a) (b) 


Figure 7. Location testing for two objects: (a) table and (b) television 


6. CONCLUSION 

This paper presents the capability of AR objects to appear in the environment by using the VSLAM. 
Tracking method used in this study was markerless tracking in which the camera is not restricted on the 
tracking marker to create the object. By this way, it is proven that AR and VSLAM can fully connect the 
virtual reality and real time, so called mixed reality limitlessly. The process of localization and mapping 
simultaneously really helps the users to use this application and apply it in real time. Based on the testing, a 
good hardware and operating software that support running this application makes the VSLAM experience 
run smoothly. Previously, the usage of hardware such as Microsoft Kinect, which is very expensive, was 
needed to do the mapping. However, currently, the usage of phone cameras that already have the depth 
camera built-in can help to ease the process of mapping the environment. It shows that this study has 
achieved the application of VSLAM for mobile apps with AR is successfully applied and the apps can 
perform smoothly as the user feedback. Google ARCore really helps a lot on the development of the system 
in which it has a lot of features to help developers on creating and expressing the idea of developing good 
multimedia based applications such as AR application. Markerless tracking method is among the best 
tracking method to use rather than marker based if the purpose is for wide tracking without any restriction on 
scanning the visual images and prevent object occlusion. RGB-D camera help in creating a map that works 
on VSLAM to understand the environment better. 
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